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ABSTRACT 

The transputer parallel processing lab at NASA Lewis Research Center consists 
of 69 processors (transputers) that can be connected into various networks for 
use in general purpose concurrent processing applications. The main goal of 
the lab is to develop concurrent scientific and engineering application pro- 
grams that will take advantage of the computational speed increases available 
on a parallel processing system over a sequential processing system. 

Because many scientific and engineering applications of interest generate large 
volumes of raw data, it is often convenient to display results in a graphic 
format. Since the analyses are performed on the transputer system, a package 
of graphics manipulation and display routines has been developed to also run 
on that system. This reduces the need for transferring data to other systems 
for viewing and postprocessing. 

The transputer multiprocessor graphics display program uses techniques that 
would be of value in almost any concurrent application. Some of the topics 
studied in the lab include interprocessor communication time versus computation 
time, handling and simulation of global variables on processors with only 
local memory, and process synchronization. 

The current implementation of the graphics program uses two processors to per- 
form all of the graphics computations. The display processor board performs 
the low-level, device coordinate scan-conversion tasks and drives a CRT moni- 
tor. This low-level operating environment is normally transparent to the 
applications programmer although, if necessary, graphics applications can be 
developed using the low-level routines. The applications programmer normally 
interfaces with the other processor, the two-dimensional processor. At this 
level, all graphics operations can be performed in a two-dimensional world 
space. Standard two-dimensional operations such as rotation, translation, and 
scaling can be performed using the provided routines. Other routines allow 
multiple windows to be manipulated individually and allow screens and windows 
to be double buffered for smooth animation. 


*Senior Research Associate (work funded under Space Act Agreement 
C99066G) . 
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Future enhancements to the graphics system will include extensions to three- 
dimensional space. This would probably involve adding one or more processors 
to the current two in order to keep drawing speeds sufficiently fast. 



OVERVIEW 


WHAT IS A TRANSPUTER? 

A transputer is a microcomputer with its own local memory and with links that 
can be used to connect it to other transputers. A transputer can be used in a 
single processor system, or in networks to build high-performance concurrent 
systems. The following figure was adapted from INMOS (1986). 
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BENEFITS OF TRANSPUTERS - FLEXIBLE CONNECTION ARCHITECTURE 

Transputers can be used to build low-cost, high-speed concurrent networks. 
Flexible connection architecture allows optimum configuration for a wide range 
of problems. The following figure was adapted from INMOS (1986). 
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TRANSPUTER PARALLEL PROCESSING LABORATORY FACILITIES 


The transputer parallel processing laboratory facilities include the following: 

(1) Forty 32-bit floating point transputers with 256 KBytes memory per 
transputer 

(2) Twenty-seven 16-bit integer processors - 24 with 8 KBytes of high-speed 
memory and 3 with 64 KBytes of high-speed memory 

(3) One 32-bit transputer-based medium-performance graphics display board with 
512 by 512 pixel resolution and capable of displaying 256 out of 262 144 
colors at one time 

(4) One 32-bit transputer-based development board with 2 MBytes of memory. 

The development board plugs into the IBM PC slot. System development 
software is run on this board. 


GRAPHICS MONITOR 



PC BASED 

DEVELOPMENT SYSTEM 


CD-88-32006 


1-129 




BENEFITS OF MULTIPROCESSOR GRAPHICS COMPUTATIONS 


• ALLOWS ANALYSIS AND POST PROCESSING TO BE PERFORMED ON ONE SYSTEM 


• USES MULTIPROCESSING TECHNIQUES FOR INCREASED PERFORMANCE 


• TECHNIQUES DEVELOPED SHOW HOW CAREFUL ANALYSIS OF COMPUTATION 
VERSUS COMMUNICATION CAN BE USED FOR DETERMINING PERFORMANCE OF 
CONCURRENT ALGORITHMS 


CD-88-32007 
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POSTER PRESENTATION 


WHAT IS A TRANSPUTER? 


A transputer is a microcomputer with its own local memory and with links that 
can be used to connect one transputer to another transputer. 

A typical member of the transputer family is a single-chip very large scale 
integration (VLSI) device that contains a processor, memory, and serial links 
for point-to-point communication between transputers. A transputer can be used 
in a single processor system or in networks to build high-performance concur- 
rent systems (INMOS, 1986). 

Some of the transputers currently available include a 16-bit transputer with 
four serial links and 2K of on-chip memory; a 32-bit transputer with four 
serial links and 2K of on-chip memory; and a 32-bit transputer with four serial 
links, 4K of on-chip memory, and a built-in floating point unit. The serial 
links can transfer data at 10 or 20 Mbit/sec. 

The block diagram of the floating point version of the transputer chip is shown 
below. 
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PROGRAMMING NETWORKS OF TRANSPUTERS 


Transputers can be programmed in high-level languages such as FORTRAN, C, and 
Pascal. To take full advantage of concurrent programming capabilities, trans- 
puters can be programmed in Occam. Occam takes advantage of the multitasking 
and communication features built into the transputer architecture. 

The Occam software building block is the process. A system is designed as an 
interconnected set of processors. Each process communicates to other processes 
through point-to-point channels. Process-to-process communication is automati- 
cally synchronized without user intervention. 

The following figure shows three processes that are running on either a single 
processor or a network of three processors. The Occam code fragments show how 
easy it is to change the mapping from a single transputer to a network of 
transputers . 
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TRANSPUTER PARALLEL PROCESSING LABORATORY FACILITIES 

The transputer parallel processing laboratory facilities consist of the hard- 
ware and software described below. All of this equipment can fit on a desktop 
and requires no special cooling or power. 


HARDWARE 

• IBM AT-COMPATIBLE PC THAT ACTS AS THE SYSTEM FILM SERVER 

• ONE 32-BIT TRANSPUTER DEVELOPMENT SYSTEM WITH 2M DRAM (PLUGS INTO 
PC SLOT) 

• FORTY 32-BIT FLOATING-POINT TRANSPUTERS WITH 256K DRAM PER TRANSPUTER 
FOR A TOTAL OF 10M 

• TWENTY-SEVEN 16-BIT TRANSPUTERS-24 WITH 8K SRAM AND 3 WITH 64K SRAM 


• ONE GRAPHICS BOARD CONTAINING ONE 32-BIT TRANSPUTER, 512K PROGRAM 
MEMORY, 512K DUAL-PORT VIDEO MEMORY, AND A HIGH-PERFORMANCE COLOR 
LOOK-UP TABLE CAPABLE OF DISPLAYING 256 OUT OF 262 144 COLORS AT ONE TIME 

• ONE HIGH-PERFORMANCE MULTIFREQUENCY RGB ANALOG MONITOR 

SOFTWARE 

• TRANSPUTER DEVELOPMENT SYSTEM (TDS) CONTAINING EDITOR, VARIOUS 
UTILITIES, AND AN EMBEDDED OCCAM COMPILER 
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TRANSPUTER CABINETS WITH ASSOCIATED GRAPHICS DISPLAY MONITOR 


The transputer cabinets are desktop size and can easily hold 80 or more trans 
puters. Note the backplane wiring which can be changed to create various pro 
cessor interconnection architectures. 
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ORIGINAL PAGE IS 
TYPICAL TRANSPUTER BOARD ^ poOR QUALITY 

This photograph shows a typical transputer board. It contains four processors 
(transputers) each with 256 KBytes of memory. 
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TRANSPUTER GRAPHICS SYSTEM: 
USER INTERFACE PROCEDURES 


Applications programmers make calls to graphics routines provided in the pack- 
age. The code is not available as a library, but the source code is included 
in any applications program. The user is insulated from any of the details of 
the graphics system, and only high-level graphics function calls are required. 

The user defines a model in two-dimensional, real-coordinate space. Window 
size and placement on the screen is controlled in normalized device coordinates 
(screen size is from 0 to 1 on each axis). Multiple windows are allowed. The 
user can generate a global transformation matrix to perform scaling, rotation, 
and translation of the model data base. 
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GRAPHICS DISPLAY SYSTEM ARCHITECTURE 


The current implementation of the graphics display system uses two processors. 
The two-dimensional world processor converts the user's model from two- 
dimensional world space to device coordinates. The appropriate commands are 
sent to the graphics display board in device coordinates, and the picture is 
displayed on the graphics CRT. 
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TWO-DIMENSIONAL WORLD PROCESSOR 


The two-dimensional world processor converts the user's model from two- 
dimensional world space to normalized device coordinates. The user specifies 
all the drawing commands using two-dimensional world coordinates. Viewport 
sizing is performed in normalized device coordinates. The conversion to device 
coordinates is transparent to an application programmer. Multiple windows are 
allowed. Maintenance of global window parameters is transparent to the user. 
Copies of global window parameters are kept on both the two-dimensional world 
processor and the graphics board since there is no shared memory on this 
system. 
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COMPUTATION AND COMMUNICATION PERFORMANCE 


The architecture of the graphics display system is primarily dictated by compu- 
tation versus communication times. Since there is only one transputer driving 
the video memory, a comparison must be made of the time to remotely (on another 
processor) perform a computation, communicate the computed data to the display 
board, and copy the result into video memory to the time to compute it on the 
display board and put it into video memory. 

For the case of line scan conversion, the communication time dictates. For 
this reason, scan-conversion tasks are performed on the display board's proces- 
sor. Since most drawings use multiple straight lines, a pipeline of two pro- 
cessors is currently being used for the graphics system. 

The following table shows the actual timings* for the graphics operations. 

Note that it is quicker to use the normal graphics board commands to draw a 
line (14 887 psec) compared to precomputing the line on another processor and 
sending that data to the graphics board for display (36 399 psec). 


OPERATIONS PERFORMED 

TIME,* 

psec 

SCAN CONVERT LINE FROM (0,0) TO (511,511) 

7 933 

SEND LINE DATA TO DISPLAY BOARD 

12 512 

SEND DATA AND DISPLAY 

28 400 

SCAN CONVERT, SEND DATA, AND DISPLAY 

36 339 

GRAPHICS BOARD DRAW LINE COMMAND 

14 887 

GRAPHICS BOARD FAST DRAW LINE COMMAND 

3 542 


•TIMINGS TYPICALLY VARY UP TO 0.1 PERCENT. 


CD-88-32015 


1-139 



SUMMARY 


A package of two-dimensional graphics routines has been developed to run on a 
transputer-based parallel processing system. These routines have been designed 
to enable applications programmers to easily generate and display results from 
the transputer network in a graphic format. 

The graphics procedures have been designed for the lowest possible network com- 
munication overhead for increased performance. The routines have also been 
designed for ease of use and to present an intuitive approach to generating 
graphics on the transputer parallel processing system. 
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